{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "deletable": true,
    "editable": true,
    "id": "mHF9VCProKJN"
   },
   "source": [
    "# AI Explanations: Explaining a tabular data model\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "deletable": true,
    "editable": true,
    "id": "hZzRVxNtH-zG"
   },
   "source": [
    "## Overview\n",
    "\n",
    "In this tutorial we will perform the following steps:\n",
    "\n",
    "1. Build and train a Keras model.\n",
    "1. Export the Keras model as a TF 1 SavedModel and deploy the model on Cloud AI Platform.\n",
    "1. Compute explainations for our model's predictions using Explainable AI on Cloud AI Platform."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "deletable": true,
    "editable": true,
    "id": "iN69d4D9Flrh"
   },
   "source": [
    "### Dataset\n",
    "\n",
    "The dataset used for this tutorial was created from a BigQuery Public Dataset: [NYC 2018 Yellow Taxi data](https://console.cloud.google.com/bigquery?filter=solution-type:dataset&q=nyc%20taxi&id=e4902dee-0577-42a0-ac7c-436c04ea50b6&subtask=details&subtaskValue=city-of-new-york%2Fnyc-tlc-trips&project=michaelabel-gcp-training&authuser=1&subtaskIndex=3). "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "deletable": true,
    "editable": true,
    "id": "Su2qu-4CW-YH"
   },
   "source": [
    "### Objective\n",
    "\n",
    "The goal is to train a model using the Keras Sequential API that predicts how much a customer is compelled to pay (fares + tolls) for a taxi ride given the pickup location, dropoff location, the day of the week, and the hour of the day.\n",
    "\n",
    "This tutorial focuses more on deploying the model to AI Explanations than on the design of the model itself. We will be using preprocessed data for this lab. If you wish to know more about the data and how it was preprocessed please see this [notebook](https://github.com/GoogleCloudPlatform/training-data-analyst/blob/master/courses/machine_learning/deepdive/01_bigquery/c_extract_and_benchmark.ipynb).\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "deletable": true,
    "editable": true,
    "id": "TSy-f05IO4LB"
   },
   "source": [
    "### Setup"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "both",
    "colab": {},
    "colab_type": "code",
    "deletable": true,
    "editable": true,
    "id": "4qxwBA4RM9Lu"
   },
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "PROJECT_ID = \"\" # TODO: your PROJECT_ID here.\n",
    "os.environ[\"PROJECT_ID\"] = PROJECT_ID"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "deletable": true,
    "editable": true,
    "id": "bTxmbDg1I0x1"
   },
   "outputs": [],
   "source": [
    "BUCKET_NAME = \"\" # TODO: your BUCKET_NAME here. \n",
    "REGION = \"us-central1\"\n",
    "\n",
    "os.environ['BUCKET_NAME'] = BUCKET_NAME\n",
    "os.environ['REGION'] = REGION"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "deletable": true,
    "editable": true,
    "id": "fsmCk2dwJnLZ"
   },
   "source": [
    "Run the following cell to create your Cloud Storage bucket if it does not already exist."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "deletable": true,
    "editable": true,
    "id": "160PRO3aJqLD"
   },
   "outputs": [],
   "source": [
    "%%bash\n",
    "exists=$(gsutil ls -d | grep -w gs://${BUCKET_NAME}/)\n",
    "\n",
    "if [ -n \"$exists\" ]; then\n",
    "   echo -e \"Bucket gs://${BUCKET_NAME} already exists.\"\n",
    "    \n",
    "else\n",
    "   echo \"Creating a new GCS bucket.\"\n",
    "   gsutil mb -l ${REGION} gs://${BUCKET_NAME}\n",
    "   echo -e \"\\nHere are your current buckets:\"\n",
    "   gsutil ls\n",
    "fi"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Timestamp\n",
    "\n",
    "If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, we create a timestamp for each instance session, and append onto the name of resources which will be created in this tutorial."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from datetime import datetime\n",
    "\n",
    "TIMESTAMP = datetime.now().strftime(\"%Y%m%d%H%M%S\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "PyxoF-iqqD1t"
   },
   "source": [
    "### Import libraries\n",
    "\n",
    "Import the libraries for this tutorial. This tutorial has been tested with **TensorFlow versions 2.3**."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "MEDlLSWK15UL"
   },
   "outputs": [],
   "source": [
    "import tensorflow as tf\n",
    "import pandas as pd\n",
    "\n",
    "# should be >= 2.1\n",
    "print(\"Tensorflow version \" + tf.__version__)\n",
    "if tf.__version__ < \"2.1\":\n",
    "    raise Exception(\"TF 2.1 or greater is required\")\n",
    "\n",
    "!pip install explainable-ai-sdk\n",
    "import explainable_ai_sdk"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "deletable": true,
    "editable": true,
    "id": "aRVMEU2Qshm4"
   },
   "source": [
    "## Download and preprocess the data\n",
    "\n",
    "In this section you'll download the data to train your model from a public GCS bucket. The original data is from the BigQuery datasets linked above. For your convenience, we've joined the London bike and NOAA weather tables, done some preprocessing, and provided a subset of that dataset here.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "v7HLNsvekxvz"
   },
   "outputs": [],
   "source": [
    "# Copy the data to your notebook instance\n",
    "! gsutil cp 'gs://explanations_sample_data/bike-data.csv' ./"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "deletable": true,
    "editable": true,
    "id": "8zr6lj66UlMn"
   },
   "source": [
    "### Read the data with Pandas\n",
    "\n",
    "You'll use Pandas to read the data into a `DataFrame` and then do some additional pre-processing."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "deletable": true,
    "editable": true,
    "id": "Icz22E69smnD"
   },
   "outputs": [],
   "source": [
    "data = pd.read_csv('bike-data.csv')\n",
    "\n",
    "# Shuffle the data\n",
    "data = data.sample(frac=1, random_state=2)\n",
    "\n",
    "# Drop rows with null values\n",
    "data = data[data['wdsp'] != 999.9]\n",
    "data = data[data['dewp'] != 9999.9]\n",
    "\n",
    "# Rename some columns for readability\n",
    "data = data.rename(columns={'day_of_week': 'weekday'})\n",
    "data = data.rename(columns={'max': 'max_temp'})\n",
    "data = data.rename(columns={'dewp': 'dew_point'})\n",
    "\n",
    "# Drop columns you won't use to train this model\n",
    "data = data.drop(columns=['start_station_name', 'end_station_name', 'bike_id', 'snow_ice_pellets'])\n",
    "\n",
    "# Convert trip duration from seconds to minutes so it's easier to understand\n",
    "data['duration'] = data['duration'].apply(lambda x: float(x / 60))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "vxZryg4xmdy0"
   },
   "outputs": [],
   "source": [
    "# Preview the first 5 rows of training data\n",
    "data.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, you will separate the data into features ('data') and labels ('labels')."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Save duration to its own DataFrame and remove it from the original DataFrame\n",
    "labels = data['duration']\n",
    "data = data.drop(columns=['duration'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Split data into train and test sets\n",
    "\n",
    "You'll split your data into train and test sets using an 80 / 20 train / test split."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Use 80/20 train/test split\n",
    "train_size = int(len(data) * .8)\n",
    "print(\"Train size: %d\" % train_size)\n",
    "print(\"Test size: %d\" % (len(data) - train_size))\n",
    "\n",
    "# Split your data into train and test sets\n",
    "train_data = data[:train_size]\n",
    "train_labels = labels[:train_size]\n",
    "\n",
    "test_data = data[train_size:]\n",
    "test_labels = labels[train_size:]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "kV_NEAQwwH0e"
   },
   "source": [
    "## Build, train, and evaluate our model with Keras \n",
    "\n",
    "This section shows how to build, train, evaluate, and get local predictions from a model by using the Keras [Sequential API](https://www.tensorflow.org/guide/keras/sequential_model). The model will takes your 10 features as input and predict the trip duration in minutes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "HCQFzd_YdwLX"
   },
   "outputs": [],
   "source": [
    "# Build your model\n",
    "model = tf.keras.Sequential(name=\"bike_predict\")\n",
    "model.add(tf.keras.layers.Dense(64, input_dim=len(train_data.iloc[0]), activation='relu'))\n",
    "model.add(tf.keras.layers.Dense(32, activation='relu'))\n",
    "model.add(tf.keras.layers.Dense(1))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "UvAcjSUcs_l7"
   },
   "outputs": [],
   "source": [
    "# Compile the model and see a summary\n",
    "optimizer = tf.keras.optimizers.Adam(0.001)\n",
    "model.compile(loss='mean_squared_logarithmic_error', optimizer=optimizer)\n",
    "model.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "GcOkuHPVwjiM"
   },
   "source": [
    "### Create an input data pipeline with tf.data\n",
    "\n",
    "Per best practices, we will use `tf.Data` to create our input data pipeline. Our data is all in an in-memory dataframe, so we will use `tf.data.Dataset.from_tensor_slices` to create our pipeline."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "ZUu9wFklwmm6"
   },
   "outputs": [],
   "source": [
    "batch_size = 256\n",
    "epochs = 3\n",
    "\n",
    "input_train = tf.data.Dataset.from_tensor_slices(train_data)\n",
    "output_train = tf.data.Dataset.from_tensor_slices(train_labels)\n",
    "input_train = input_train.batch(batch_size).repeat()\n",
    "output_train = output_train.batch(batch_size).repeat()\n",
    "train_dataset = tf.data.Dataset.zip((input_train, output_train))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "l98aRzfPwo5e"
   },
   "source": [
    "### Train the model\n",
    "\n",
    "Now we train the model. We will specify a number of epochs which to train the model and tell the model how many steps to expect per epoch."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "h1x_8CR0wtRs"
   },
   "outputs": [],
   "source": [
    "# This will take about a minute to run\n",
    "# To keep training time short, you're not using the full dataset\n",
    "model.fit(train_dataset, steps_per_epoch=train_size // batch_size, epochs=epochs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Evaluate the trained model locally"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Run evaluation\n",
    "results = model.evaluate(test_data, test_labels)\n",
    "print(results)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "bIh6uds2x2tr"
   },
   "outputs": [],
   "source": [
    "# Send test instances to model for prediction\n",
    "predict = model.predict(test_data[:5])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Preview predictions on the first 5 examples from your test dataset\n",
    "for i, val in enumerate(predict):\n",
    "    print('Predicted duration: {}'.format(round(val[0])))\n",
    "    print('Actual duration: {} \\n'.format(test_labels.iloc[i]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "deletable": true,
    "editable": true,
    "id": "gAO6-zv6osJ8"
   },
   "source": [
    "## Export the model as a TF 2.x SavedModel\n",
    "\n",
    "When using TensorFlow 2.x, you export the model as a `SavedModel` and load it into Cloud Storage. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "fbvzBm1lji7b"
   },
   "outputs": [],
   "source": [
    "export_path = 'gs://' + BUCKET_NAME + '/explanations/mymodel'\n",
    "model.save(export_path)\n",
    "print(export_path)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "-f8elyM8KMNX"
   },
   "source": [
    "Use TensorFlow's `saved_model_cli` to inspect the model's SignatureDef. We'll use this information when we deploy our model to AI Explanations in the next section."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "yFg5r-7s1BKr"
   },
   "outputs": [],
   "source": [
    "! saved_model_cli show --dir $export_path --all"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "y270ZNinycoy"
   },
   "source": [
    "## Deploy the model to AI Explanations\n",
    "\n",
    "In order to deploy the model to Explanations, you need to generate an `explanations_metadata.json` file and upload this to the Cloud Storage bucket with your SavedModel. Then you'll deploy the model using `gcloud`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "cUdUVjjGbvQy"
   },
   "source": [
    "### Prepare explanation metadata\n",
    "\n",
    "In order to deploy this model to AI Explanations, you need to create an explanation_metadata.json file with information about your model inputs, outputs, and baseline. You can use the [Explainable AI SDK](https://pypi.org/project/explainable-ai-sdk/) to generate most of the fields. \n",
    "\n",
    "The value for `input_baselines` tells the explanations service what the baseline input should be for your model. Here you're using the median for all of your input features. That means the baseline prediction for this model will be the trip duration your model predicts for the median of each feature in your dataset. \n",
    "\n",
    "Since this model accepts a single numpy array with all numerical feature, you can optionally pass an `index_feature_mapping` list to AI Explanations to make the API response easier to parse. When you provide a list of feature names via this parameter, the service will return a key / value mapping of each feature with its corresponding attribution value."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "UolAW3lcVTGl"
   },
   "outputs": [],
   "source": [
    "# Print the names of your tensors\n",
    "print('Model input tensor: ', model.input.name)\n",
    "print('Model output tensor: ', model.output.name)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "qpZiW9Cq6IY4"
   },
   "outputs": [],
   "source": [
    "from explainable_ai_sdk.metadata.tf.v2 import SavedModelMetadataBuilder\n",
    "builder = SavedModelMetadataBuilder(export_path)\n",
    "builder.set_numeric_metadata(\n",
    "    model.input.name.split(':')[0],\n",
    "    input_baselines=[train_data.median().values.tolist()],\n",
    "    index_feature_mapping=train_data.columns.tolist()\n",
    ")\n",
    "builder.save_metadata(export_path)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "rT3iG5pDdrHi"
   },
   "source": [
    "Since this is a regression model (predicting a numerical value), the baseline prediction will be the same for every example we send to the model. If this were instead a classification model, each class would have a different baseline prediction."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "J6MKKy6Xb2MT"
   },
   "source": [
    "### Create the model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import datetime\n",
    "MODEL = 'bike' + datetime.datetime.now().strftime(\"%d%m%Y%H%M%S\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create the model if it doesn't exist yet (you only need to run this once)\n",
    "! gcloud ai-platform models create $MODEL --enable-logging --region=$REGION"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create the model version \n",
    "\n",
    "Creating the version will take ~5-10 minutes. Note that your first deploy could take longer."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "S2OaOycmb4o0"
   },
   "outputs": [],
   "source": [
    "# Each time you create a version the name should be unique\n",
    "VERSION = 'v1'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "0bwCxEr5b8BP"
   },
   "outputs": [],
   "source": [
    "# Create the version with gcloud\n",
    "explain_method = 'integrated-gradients'\n",
    "! gcloud beta ai-platform versions create $VERSION \\\n",
    "--model $MODEL \\\n",
    "--origin $export_path \\\n",
    "--runtime-version 2.1 \\\n",
    "--framework TENSORFLOW \\\n",
    "--python-version 3.7 \\\n",
    "--machine-type n1-standard-4 \\\n",
    "--explanation-method $explain_method \\\n",
    "--num-integral-steps 25 \\\n",
    "--region $REGION"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Make sure the model deployed correctly. State should be `READY` in the following log\n",
    "! gcloud ai-platform versions describe $VERSION --model $MODEL --region $REGION"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "deletable": true,
    "editable": true,
    "id": "JzevJps9IOcU"
   },
   "source": [
    "## Get predictions and explanations\n",
    "\n",
    "Now that your model is deployed, you can use the AI Platform Prediction API to get feature attributions. You'll pass it a single test example here and see which features were most important in the model's prediction. Here you'll use the [Explainable AI SDK](https://pypi.org/project/explainable-ai-sdk/) to get your prediction and explanation. You can also use `gcloud`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "CJ-2ErWJDvcg"
   },
   "source": [
    "### Format your explanation request\n",
    "\n",
    "To make your AI Explanations request, you need to create a JSON object with your test data for prediction."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Format data for prediction to your model\n",
    "prediction_json = {model.input.name.split(':')[0]: test_data.iloc[0].values.tolist()}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Send the explain request\n",
    "\n",
    "You can use the Explainable AI SDK to send explanation requests to your deployed model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "D_PR2BcHD40-"
   },
   "outputs": [],
   "source": [
    "remote_ig_model = explainable_ai_sdk.load_model_from_ai_platform(project=PROJECT_ID, \n",
    "                                                                 model=MODEL, \n",
    "                                                                 version=VERSION,\n",
    "                                                                 region=REGION)\n",
    "ig_response = remote_ig_model.explain([prediction_json])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "0nKR8RelNnkK"
   },
   "source": [
    "### Understanding the explanations response\n",
    "\n",
    "First, let's look at the trip duration your model predicted and compare it to the actual value."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "825KoNgHR-tv"
   },
   "outputs": [],
   "source": [
    "attr = ig_response[0].get_attribution()\n",
    "\n",
    "predicted = round(attr.example_score, 2)\n",
    "print('Predicted duration: ' + str(predicted) + ' minutes')\n",
    "print('Actual duration: ' + str(test_labels.iloc[0]) + ' minutes')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next let's look at the feature attributions for this particular example. Positive attribution values mean a particular feature pushed your model prediction up by that amount, and vice versa for negative attribution values."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ig_response[0].visualize_attributions()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Check your explanations and baselines\n",
    "\n",
    "To better make sense of the feature attributions you're getting, you should compare them with your model's baseline. In most cases, the sum of your attribution values + the baseline should be very close to your model's predicted value for each input. Also note that for regression models, the `baseline_score` returned from AI Explanations will be the same for each example sent to your model. For classification models, each class will have its own baseline.\n",
    "\n",
    "In this section you'll send 10 test examples to your model for prediction in order to compare the feature attributions with the baseline. Then you'll run each test example's attributions through two sanity checks in the `sanity_check_explanations` method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Prepare 10 test examples to your model for prediction\n",
    "pred_batch = []\n",
    "for i in range(10):\n",
    "    pred_batch.append({model.input.name.split(':')[0]: test_data.iloc[i].values.tolist()})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "test_response = remote_ig_model.explain(pred_batch)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the function below you perform two sanity checks for models using Integrated Gradient (IG) explanations and one sanity check for models using Sampled Shapley."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def sanity_check_explanations(example, mean_tgt_value=None, variance_tgt_value=None):\n",
    "    passed_test = 0\n",
    "    total_test = 1\n",
    "    # `attributions` is a dict where keys are the feature names\n",
    "    # and values are the feature attributions for each feature\n",
    "    attr = example.get_attribution()\n",
    "    baseline_score = attr.baseline_score\n",
    "    # sum_with_baseline = np.sum(attribution_vals) + baseline_score\n",
    "    predicted_val = attr.example_score\n",
    "\n",
    "    # Sanity check 1\n",
    "    # The prediction at the input is equal to that at the baseline.\n",
    "    #  Please use a different baseline. Some suggestions are: random input, training\n",
    "    #  set mean.\n",
    "    if abs(predicted_val - baseline_score) <= 0.05:\n",
    "        print('Warning: example score and baseline score are too close.')\n",
    "        print('You might not get attributions.')\n",
    "    else:\n",
    "        passed_test += 1\n",
    "\n",
    "    # Sanity check 2 (only for models using Integrated Gradient explanations)\n",
    "    # Ideally, the sum of the integrated gradients must be equal to the difference\n",
    "    # in the prediction probability at the input and baseline. Any discrepency in\n",
    "    # these two values is due to the errors in approximating the integral.\n",
    "    if explain_method == 'integrated-gradients':\n",
    "        total_test += 1\n",
    "        want_integral = predicted_val - baseline_score\n",
    "        got_integral = sum(attr.post_processed_attributions.values())\n",
    "        if abs(want_integral - got_integral) / abs(want_integral) > 0.05:\n",
    "            print('Warning: Integral approximation error exceeds 5%.')\n",
    "            print('Please try increasing the number of integrated gradient steps.')\n",
    "        else:\n",
    "            passed_test += 1\n",
    "\n",
    "    print(passed_test, ' out of ', total_test, ' sanity checks passed.')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "for response in test_response:\n",
    "    sanity_check_explanations(response)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Understanding AI Explanations with the What-If Tool\n",
    "\n",
    "In this section you'll use the [What-If Tool](https://pair-code.github.io/what-if-tool/) to better understand how your model is making predictions. See the cell below the What-if Tool for visualization ideas."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The What-If-Tool expects data with keys for each feature name, but your model expects a flat list. The functions below convert data to the format required by the What-If Tool."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# This is the number of data points you'll send to the What-if Tool\n",
    "WHAT_IF_TOOL_SIZE = 500\n",
    "\n",
    "from witwidget.notebook.visualization import WitWidget, WitConfigBuilder\n",
    "\n",
    "\n",
    "def create_list(ex_dict):\n",
    "    new_list = []\n",
    "    for i in feature_names:\n",
    "        new_list.append(ex_dict[i])\n",
    "    return new_list\n",
    "\n",
    "\n",
    "def example_dict_to_input(example_dict):\n",
    "    return {'dense_input': create_list(example_dict)}\n",
    "\n",
    "\n",
    "from collections import OrderedDict\n",
    "wit_data = test_data.iloc[:WHAT_IF_TOOL_SIZE].copy()\n",
    "wit_data['duration'] = test_labels[:WHAT_IF_TOOL_SIZE]\n",
    "wit_data_dict = wit_data.to_dict(orient='records', into=OrderedDict)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "config_builder = WitConfigBuilder(\n",
    "    wit_data_dict\n",
    ").set_ai_platform_model(\n",
    "    PROJECT_ID,\n",
    "    MODEL,\n",
    "    VERSION,\n",
    "    adjust_example=example_dict_to_input\n",
    ").set_target_feature('duration').set_model_type('regression')\n",
    "\n",
    "WitWidget(config_builder)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### What-If Tool visualization ideas\n",
    "\n",
    "On the x-axis, you'll see the predicted trip duration for the test inputs you passed to the What-If Tool. Each circle represents one of your test examples. If you click on a circle, you'll be able to see the feature values for that example along with the attribution values for each feature. \n",
    "\n",
    "* You can edit individual feature values and re-run prediction directly within the What-If Tool. Try changing `distance`, click **Run inference** and see how that affects the model's prediction\n",
    "* You can sort features for an individual example by their attribution value, try changing the sort from the attributions dropdown\n",
    "* The What-If Tool also lets you create custom visualizations. You can do this by changing the values in the dropdown menus above the scatter plot visualization. For example, you can sort data points by inference error, or by their similarity to a single datapoint."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Cleaning up\n",
    "\n",
    "To clean up all GCP resources used in this project, you can [delete the GCP\n",
    "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n",
    "\n",
    "Alternatively, you can clean up individual resources by running the following\n",
    "commands:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Delete model version resource\n",
    "! gcloud ai-platform versions delete $VERSION --quiet --model $MODEL\n",
    "\n",
    "# Delete model resource\n",
    "! gcloud ai-platform models delete $MODEL --quiet\n",
    "\n",
    "# Delete Cloud Storage objects that were created\n",
    "! gsutil -m rm -r gs://$BUCKET_NAME"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If your Cloud Storage bucket doesn't contain any other objects and you would like to delete it, run `gsutil rm -r gs://$BUCKET_NAME`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## What's next?\n",
    "\n",
    "To learn more about AI Explanations or the What-if Tool, check out the resources here.\n",
    "\n",
    "* [AI Explanations documentation](cloud.google.com/ml-engine/docs/ai-explanations)\n",
    "* [Documentation for using the What-if Tool with Cloud AI Platform models ](https://cloud.google.com/ml-engine/docs/using-what-if-tool) \n",
    "* [What-If Tool documentation and demos](https://pair-code.github.io/what-if-tool/)\n",
    "* [Integrated gradients paper](https://arxiv.org/abs/1703.01365)"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "collapsed_sections": [],
   "name": "AI Explanations on CAIP ",
   "private_outputs": true,
   "provenance": []
  },
  "environment": {
   "name": "tf2-gpu.2-3.m65",
   "type": "gcloud",
   "uri": "gcr.io/deeplearning-platform-release/tf2-gpu.2-3:m65"
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
